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METHOD AND SYSTEM FOR ENABLING 
REAL-TIME SPECKLE PROCESSING USING 
HARDWARE PLATFORMS 

This invention was made with Government support under 
Contract NNK06OM14C awarded by NASA. The Govern- 
ment has certain rights in this invention. 

FIELD OF THE INVENTION 

The field of the invention is optical image processing. In 
particular, the present invention relates to hardware and meth- 
ods for efficient algorithms for large-aperture optical image 
processing. 

BACKGROUND OF THE INVENTION 

The quality of images taken with long-range optical imag- 
ing systems can be severely degraded by atmospheric move- 
ments, such as turbulence and air movement, in the path 
between the region under observation and the imaging sys- 
tem. In particular, as distance increases, atmospheric turbu- 
lence is often the dominating source of image degradation in 
infrared and visible imaging applications. Assuming ideal 
observation conditions, the minimum distinguishable feature 
size that can be resolved using a given optical imaging system 
is bounded by the diffraction limit (1 .22X/D), where X is the 
wavelength and D is the distance. This diffraction limit sug- 
gests that large-aperture optical imaging systems enable finer 
image features to be resolved/distinguished. 

However, in large-aperture optical imaging systems, tur- 
bulence and air movement become the limiting factors long 
before the diffraction limit effects discussed above appear. In 
particular, the minimum distinguishable feature under turbu- 
lent conditions is given by the equation 1 .22k/R0, where R0 
may be as small as a few centimeters and may be dependent 
on the strength of the turbulence and air movement. Thus, 
there is a practical limit on the ability to image distant objects 
in background art large- aperture optical imaging systems. 
Due to this, large-aperture optical imaging systems of the 
background art systems: (1) have not been able to take full 
advantage of the potential for increased resolution suggested 
by the diffraction limit; and (2) do not provide improvements 
in resolution and feature separation characteristics over 
smaller-aperture optical imaging systems. 

For example, a very similar problem is faced by astrono- 
mers when trying to image the sky through the turbulent 
atmosphere of the earth using large telescopes. To overcome 
this limitation, special signal processing algorithms were 
developed that are capable of minimizing the effects of a 
turbulent atmospheric path by combining information from 
several images taken in a time sequence. A bispectral speckle 
imaging method described by C. J. Carrano, in “Speckle 
imaging over horizontal paths,” was presented at High Reso- 
lution Wavefront Control: Methods, Devices, and Applica- 
tions IV, 2002. Unlike astronomical optical imaging, the chal- 
lenges in large-aperture optical imaging systems for 
horizontal or slanted atmospheric paths are that the scenes are 
extended and the scene covers a very large visible angle. 
Thus, in general, the small-angle approximations typically 
used in astronomical applications cannot be directly applied 
to slanted path imaging systems. 

In addition, the background art includes other digital signal 
processing techniques that have been applied to degraded 
images in an attempt to correct the images to overcome atmo- 
spheric turbulence. In an article by B. R. Frieden, entitled: 
“An exact, linear solution to the problem of imaging through 
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turbulence,” Opt. Comm. 150 (1998) 15, a sequence of two 
short-exposure intensity images is taken without any refer- 
ence point sources. The images are then Fourier transformed 
and divided by linear equations based on two random point 
5 spread functions. The result is then inverse filtered to provide 
an image of an object. However, a problem with this method 
is that the point spread functions associated with the turbu- 
lence are not known in an image due to the lack of any 
reference. This situation can cause ffirther problems inrecov- 
10 ering an image taken through turbulence. Other examples of 
background art in this technology area include, but are not 
limited to: U.S. Pat. No. 7,139,067S (Pohle et al.); U.S. Pat. 
No. 7,120,312 (George); U.S. patent application Ser. No. 
15 10/661,138; U.S. patent application Ser. No. 11/017,384 
(Olivier et al) and U.S. patent application Ser. No. 10/610,1 52 
(Carrano et al). 

As another example of the above-discussed background art 
(i.e., Carrano et al.), researchers at Lawrence Livermore 
20 National Laboratories have refined the astronomical bispec- 
tral speckle imaging methods and modified them for earth- 
based use. FIG. 1 is an exemplary block diagram 100 of this 
background art method. The method combines information 
from several images, taken a short time apart from one 
25 another. These can be a series of multiple short-exposure still 
shots from a conventional camera or, more commonly, a 
sequence of consecutive video frames. This information is 
combined and processed by complex “averaging” procedures 
in the frequency domain, where the magnitude and phase are 
30 calculated independently and subsequently recombined in 
the real space. However, on a personal computer (PC), this 
method requires several seconds to analyze a single frame. 
Thus, even though this bispectral method provides accurate 
35 results, it must be accelerated in order to work in real time. 

To accommodate the spatially varying point spread func- 
tions experienced in earth-bound imaging, overlapping sub- 
fields of the image are separately speckle processed and re- 
assembled to form the full field of an the image. As shown in 
40 FIG. 2A and FIG. 2B, what results is a method that produces 
a single corrected image with quality near the diffraction 
limit. In FIG. 2 A, the image frame represents original, 
degraded video image frame captures. FIG. 2B is the effect on 
the image frame after running the speckle imaging method on 
45 the degraded images. The computational rate required is a 
direct consequence of the laige number of pixels in the image, 
which must be transformed into the frequency domain (e.g., 
by the Fast Fourier Transform (FFT)) and then to the bispec- 
tral domain. These transformations account for the majority 
50 of the computational time in the execution of the speckle 
algorithm. 

If the above-discussed problems of the background art 
could be overcome, numerous applications could benefit 
from improvements in large-aperture, optical imaging. Most 
55 obvious are applications are the military field, particularly 
intelligence, reconnaissance, and target designation. More- 
over, there are many civilian applications of this technology 
as well, especially in the surveillance and homeland security 
areas. Unfortunately, these atmospheric compensation algo- 
60 rithms are very computationally intensive, which prevents 
even top-of-the-line PCs from evaluating them in real time. 
The necessary processing typically requires tens of seconds 
to enhance a single frame. In addition, this duration of time 
for processing problem is worsened when video feeds are to 
65 be processed, since real-time video requires several dozen 
frames per second (e.g., a two order-of-magnitude 
difference). Therefore, there is a need in the art for improved 
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computational methods and systems for large-aperture opti- 
cal imaging systems that would allow real-time or increased 
performance. 

SUMMARY OF THE INVENTION 

Embodiments of the invention enable real-time speckle 
processing of video feeds that further enables the speckle 
algorithm to be applied in numerous real-time applications. 
F eatures and advantages of embodiments of the invention will 
become apparent from the following description. A broad 
representation of the invention is provided by the detailed 
description, which includes, but is not limited to: discussion, 
drawings and examples of specific embodiments. Various 
changes and modifications within the spirit and scope of the 
invention will become apparent to those skilled in the art from 
this description and by practice of the invention. 

Exemplary embodiments of the invention include hard- 
ware, software and machine readable mediums for an accel- 
erator for the Speckle atmospheric compensation algorithm. 
In particular, one embodiment of the invention is a method for 
fast computation of a Speckle Algorithm, comprising: initial- 
izing and setting up a plurality of memory locations; setting a 
frame counter to 0; inputting a present frame of a time 
sequenced image; computing a bispectrum of the present 
frame; adding the present bi spectrum computation to previ- 
ously accumulated bi spectrum computations and storing in 
one of the plurality of memory locations; incrementing the 
frame counter; checking whether the frame counter equals the 
number of frames to be processed; incrementing the frame 
counter when the frame counter is less than the number of 
frames to be processed and returning to the step of inputting 
the present frame; setting the frame counter to zero when the 
frame counter is equal to the number of frames to be pro- 
cessed and computing the normalization of the accumulated 
bispectrum computations; computing an inverse bispectrum 
computation; outputting the inverse bispectrum and returning 
to the step of inputting the present frame. 

Another embodiment of the invention is a method for real- 
time computation of a Speckle algorithm incorporating a 
sliding window, comprising: initializing and setting up a plu- 
rality of memory locations; setting a frame counter to zero; 
inputting a present frame of a time sequenced image; com- 
puting the bispectrum of the present frame; determining 
whether the computed bispectrum buffer is full with accumu- 
lated bispectrum computations: when the computed bispec- 
trum buffer is not full, setting a next oldest computed bispec- 
trum frame to an oldest computed bispectrum frame, adding 
the computed bi spectrum of the present frame to the previous 
accumulated bi spectrum frame computations and, if the com- 
puted bispectrum buffer is still not full, incrementing the 
frame counter and return to the step of inputting the present 
frame; and when the computed bispectrum buffer is lull, 
subtracting a next oldest computed bispectrum frame from 
the accumulated bi spectrum frame computations, setting a 
next oldest computed bispectrum frame to an oldest com- 
puted bi spectrum frame, adding the computed bispectrum of 
the present frame to the previous accumulated bispectrum 
frame computations and, if the buffer is still not lull, incre- 
menting the frame counter and return to the step of inputting 
the present frame; and when the computed bispectrum buffer 
is full, normalizing the accumulated computed bispectrums; 
inverting the normalized accumulated bispectrums; output- 
ting the normalized accumulated inverted bispectrums; incre- 
menting the frame counter and returning to the step of input- 
ting the present frame. 
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Yet another embodiment of the invention is a system for 
real-time computation of a method for a Speckle algorithm, as 
recited in either of the above embodiments and further com- 
prising: a host computer; a PCI bridge; an oscillator; a trans- 
5 mitter synthesizer; a serialilzer; a receiver; SRAM and 
DRAM memory modules; and an FPGA. 

Preferably, the embodiments of the system above, further 
comprises: input and output shielded connectors configured 
to input and output video data; an equalizer and cable driver 
1 0 connecter to the input and output shielded connectors , respec - 
tively ; and the deserializer and serializer are connected to the 
outputs the equalizer and cable driver, respectively. 

Preferably, in the system of the embodiment above, the 
15 FPGA further comprises: a framing receiver and framing 
transmitter connected to the outputs form the deserializer and 
serializer, respectively; and a decoder and encoder connected 
to the outputs from the framing receiver and framing trans- 
mitter, respectively, and an SDRAM and the PCI Bridge are 
20 connected to the FPGA, wherein the deserializer i s connected 
to the framing receiver and the serialize is connected to the 
framing transmitter. 

Preferably, in the system of embodiment above, the FPGA 
comprises a Speckle engine, and the Speckle engine further 
25 comprises: a startup function and parameter register file con- 
figured to control system operation; an extract tile function 
connected to a Demean function and configured to provide 
inputs to an Apodization window; a first two-dimensional 
(2-D) Real-Complex FFT connected to the outputs of the 
30 Apodization window; an Intensity function connected to the 
2-D Real-Complex FFT; a Compute Bispectrum function 
connected to outputs of the Intensity function; an Averaging 
Unit connected to outputs of the Compute Bispectrum func- 
tion; a second two-dimensional (2-D) Real-Complex FFT 
35 connected to outputs of the Averaging Unit; and an Apodiza- 
tion Gain unit connected to outputs of the second two-dimen- 
sional (2-D) Real-Complex FFT; wherein the startup function 
is connected to the parameter register file, and the Speckle 
engine, the PCI bridge is connected to the startup function and 
40 the SDRAM is connected to the Speckle engine. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Embodiments of the invention can be described in greater 
45 detail with the aid of the following drawings. 

FIG. 1 is an exemplary block diagram of a background art 
Speckle Imaging Method. 

FIG. 2A, the image frame represents original, degraded 
video image frame captures. 

50 FIG. 2B is the effect on the image frame after running the 
speckle imaging method on the degraded images. 

FIG. 3A is an exemplary block diagram of an embodiment 
of a Bi spectrum Computation Unit (BCU). 

FIG. 3B is an exemplary block diagram of an embodiment 
55 of a Bispectrum Normalization Unit (BNU). 

FIG. 4 is an exemplary block diagram combining the BCU 
and BNU to form the Compute Bispectrum function. 

FIG. 5A is an exemplary flow chart of the Speckle Algo- 
rithm of the background art. 

60 FIG. 5B is an exemplary flow chart of the re-partitioned 
and accelerated Speckle Algorithm of an embodiment of the 
invention. 

FIG. 5C is an exemplary flow chart of the re-partitioned 
and accelerated Speckle Algorithm after an embodiment of 
65 the invention after adding a Sliding Window. 

FIG. 6A is an exemplary computational flow diagram for 
the Speckle algorithm of the background art. 
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FIG. 6B is an exemplary computational flow diagram for 
the re-partitioned and accelerated Speckle Algorithm. 

FIG. 7A is an exemplary board layout of the re-partitioned 
and accelerated Speckle Algorithm Demonstration. 

FIG. 7B is an exemplary detailed block diagram of the 5 
re-partitioned and accelerated Speckle Algorithm. 

FIG. 8 A shows a block diagram of an Accelerated Speckle 
Demonstration System. 

FIG. 8B shows a parts list for an exemplary Accelerated 
Speckle Demonstration System, as shown in FIG. 8A. 1 

FIG. 9 shows a performance comparison between an origi- 
nal software version of the Speckle Algorithm and the re- 
partitioned and accelerated Speckle Demonstration System 
showing how the demonstration system outperformed the 15 
purely software approach by a factor 40x. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

20 

Generally speaking, three major computational bottle- 
necks have prevented real-time processing capabilities from 
being applied to the speckle algorithm. These bottlenecks are: 

(1) the Fast Fourier Transforms (FFTs); (2) calculation of the 
bispectrum; and (3) normalization of the bispectrum. Since 25 
they are a computational component of many scientific and 
image processing algorithms, extensive work has been done 
on the acceleration of FFTs. However, despite these efforts 
and despite computing the algorithm with accelerated FFT 
cores, the speckle algorithm would still be too slow to be of 30 
practical use. Thus, embodiments of the speckle algorithm in 
the invention include two methods: accelerating the calcula- 
tion of the bispectrum; and accelerating normalization of the 
bispectrum. To implement these two methods, embodiments 
of the invention include two hardware accelerator units: (1) a 35 
Bispectrum Computation Unit (BCU); and (2) a Bispectrum 
Normalization Unit (BNU). Moreover, a third manner of 
accelerating the speckle algorithm in embodiments of the 
invention is a non-obvious reformulation of the speckle algo- 
rithm itself. The following paragraphs will further detail the 40 
above-discussed embodiments of the invention. 

The Bispectrum Computation Unit (BCU) is a part of the 
portion of the speckle algorithm that computes an ‘Average 
Power Spectrum” stage of the speckle algorithm flow dia- 
gram 100 (e.g., see FIG. 1). A rapid calculation of the bispec- 45 
trum is at the core of accelerating the speckle algorithm and is 
a major contributor to providing a real-time implementation 
of the speckle algorithm. In particular, the BCU helps convert 
data from the frequency domain into the bispectrum domain. 
The mathematical details of this conversion process can be 50 
found, for example in: “Speckle Imaging of Satellites at the 
Air Force Maui Station” by T. W Lawrence, D. M Goodman, 

E. M Johansson, J. P Fitch, which was presented at the Euro- 
pean Southern Observatory (ESO) Conference on High Reso- 
lution Imaging By Interferometry II, Garching, F ed. Republic 55 
of Germany, 14-18 Oct. 1991. 

Novel embodiments of the invention provide a method and 
system capable of performing the bispectrum computations 
in hardware, with significant speed gains is possible. In par- 
ticular, FIG. 3A is more than a simple mapping of an algo- 60 
rithm from software to hardware, and instead the BCU rep- 
resents a new way of computing the bispectrum results. It is 
analogous to the “butterfly” used in FFT computations that 
enables efficient implementation of the FFT core operations; 
the BCU represents a core capability that provides a repeat- 65 
able computational unit that yields speed improvements in the 
implementation of the speckle algorithm. 
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In particular, the BCU of this embodiment of the invention 
takes several inputs (i.e., see the left side of FIG. 3A), repre- 
senting various intermediate calculations internal to the 
bispectrum computation process. Each of these inputs con- 
sists of both a real and imaginary portion and is passed into 
our BCU. In implementations of the BCU, the operations 
indicated in FIG. 3A must be performed multiple times on 
each pixel. This would pose a tremendous computational 
burden on standard microprocessor implementations. In 
order alleviate this computational burden, embodiments of 
the invention utilize a custom hardware processor that is 
completely pipelined (i.e., one result per cycle) and further 
comprises several computational pipelines that can be imple- 
mented in a single FPGA or other hardware platforms such as, 
but not limited to: graphical processing units (GPUs) multi - 
core processors, Single-Instruction-Multiple Data (SIMD) 
digital signal processors (DSPs), Physics acceleration 
engines and custom floating-point acceleration cards. 

The BCU of FIG. 3 A carries out the bulk of the frequency - 
to-bi spectrum domain conversion, ultimately producing 
another complex intermediate result having real (crqcqcr.re) 
and imaginary (crqcqcr.im) parts. From these complex inter- 
mediate results, crqcqcr, the actual bispectrum used by the 
speckle algorithm is generated. Since intermediate operations 
must be performed multiple times on each pixel, and each one 
can take several hundred cycles to complete, standard micro- 
processor implementations are unable to compute these 
results in real time. To alleviate this burden, embodiments of 
the invention utilize custom hardware processor that is com- 
pletely pipelined (i.e., one result per cycle) and several of 
these computational pipelines can be implemented in a single 
FPGA or other hardware platforms (e.g., graphics processing 
units). In this manner, embodiments of the invention exploit 
the computational parallelism inherent to the algorithm and 
thus, accelerate the bispectrum computations and enable real- 
time speckle processing. 

This section further discusses the Bi spectrum Normaliza- 
tion Unit (BNU), as shown in FIG. 3B. Once the bispectrum 
has been computed by the BCU, the next step in the speckle 
algorithm process is the normalization of these bispectrum 
results against a reference power spectrum. As with the 
bispectrum computation, this normalization step is computa- 
tionally intense, as it involves the addition and multiplication 
of complex numbers, square roots and division operations, as 
well as accumulation of these results. That is, though the 
elements of the bispectrum normalization functions of addi- 
tion, multiplication, division are themselves, mathematically 
simple, the overall algorithm is a computationally intense 
process that represents a tremendous bottleneck in the 
speckle process. 

To increase the performance of the speckle algorithm, 
embodiments of the invention include both a BCU and BNU 
in a Compute Bispectrum function block 1100, as shown in 
FIG. 4. In FIG. 4, The BNU 1105 receives inputs and imple- 
ments the intense computations (i.e., see FIG. 3 A) required 
by the normalization process as inputs to the custom hardware 
of the Compute Bispectrum block 1100. A Decision Function 
1103 determines whether the Compute Bispectrum block 
1100 makes a calculation of the forward or inverse bispec- 
trum. In the case of an inverse bispectrum, the outputs of the 
BCU 1101 are fed as inputs to the BNU 1105, as shown in 
FIG. 4. In the case of a forward bispectrum calculation, the 
outputs of the BCU 1103 bypass the BNU 1105. In either 
case, the outputs are stored in an output register 1107 for later 
use. As with the BCU 1101, the BNU 1105 is pipelined and 
several can be placed in parallel to further enhance the per- 
formance of the system. As shown in FIG. 3B, the inputs to 
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the BNU (bispectre, bispect.im, crqcqb) are input from the 
left and are the real and imaginary components of the previ- 
ously computed bi spectrum. The reference spectrum 
(crqcqb) is input as well in order to provide values to normal- 
ize results against. In particular, the bispectrum works on the 5 
FFT of a tile that is an Apodization windowed portion of a 
demeaned, stabilized real image. Example windowing func- 
tions include, but are not limited to: Hanning, Hann, Ham- 
ming, Bartlett, Kaiser, Nutall, Blackman, Gauss and Flat top 
windows. After accumulation of the results, the normalized 
spectrum (cgood) is output from the BNU 1105 through the 
output pipeline register 1107. 

The third aspect of accelerating the speckle algorithm is a 
non-ob vious reformulation of the algorithm that is discussed 15 
in this section. In order to successfully accelerate algorithms, 
it is frequently necessary to change the underlying algorithms 
so that the algorithm is better structured for a desired hard- 
ware platform, rather than a standard microprocessor-based 
system. The reformulation for embodiments of the invention 20 
consists of two novel components: (1) a code partitioning 
scheme; and (2) a sliding window. These two components 
will be discussed further in the following paragraphs. 

Microprocessors are general-purpose computational plat- 
forms that can easily, though perhap s inefficiently, implement 25 
diverse computational components. In contrast to micropro- 
cessors, hardware accelerators generally operate on “similar” 
computations in order to achieve a significant acceleration, 
such as the one obtained by embodiments of the invention. 

For example, graphics processing units (GPUs) are single- 30 
instruction, multiple-data (SIMD) computational engines 
that work most efficiently when processing data in parallel. 
Similarly, field-programmable gate arrays (FPGAs), allow 
multiple parallel computational data paths in order to increase 
computational performance. Thus, neither of these platforms 35 
is very efficient when operating on the diverse computational 
components of the structure of the original speckle algorithm. 
Reformulating the speckle algorithm will allow the process to 
be better matched to the strengths of accelerated hardware 
discussed above. 40 

The speckle algorithm was originally designed to run on a 
microprocessor-based systems and thus, functioned as a 
single program to perform all aspects of the algorithm. How- 
ever, when transitioning the algorithm to a pipelined hard- 
ware accelerated platform of embodiments of the invention, 45 
separating the functionality of the original speckle algorithm 
was developed. That is, the reformulation of embodiments of 
the invention has modified the original speckle algorithm to 
logically create two components: (1) setup; and (2) solve. 
Each of these components can be loaded onto a hardware 50 
acceleration platform of embodiments of the invention indi- 
vidually in order to obtain the best computational perfor- 
mance from the reformulation processing system. In this way, 
the entire hardware platform of embodiments of the invention 
can be “dedicated” to the computation of a given type of code 55 
section (e.g., BCU, BNU, Reformulation), rather than ineffi- 
ciently utilizing the hardware of a standard microprocessor to 
solve diverse computations. 

FIG. 5A is an exemplary flow chart representing the soft- 
ware code structure of the original speckle algorithm. In step 60 
501 of FIG. 5A, the frame counter is set to 0. A frame of a time 
sequenced image is read in step 503. Setting up the bispec- 
trum data in preparation for computation occurs in step 505. 

In step 507, the bi spectrum of the frame is computed. Step 
509 involves adding the computed bispectrum to accumu- 65 
lated bispectrums. The frame counter is incremented in step 
511. Checking whether the frame counter equals the number 
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of frames to be processed occurs in step 513. While the frame 
counter is less than the number of frames to be processed, 
steps 503 to 513 are repeated. 

Alternatively, as shown in FIG. 5 A, when step 513 deter- 
mines the frame counter equals the number of frames to be 
processed, the accumulated bispectrum data is set-up for a 
normalization process in step 515. Step 517 computes the 
normalization of the accumulated bispectrum data. In step 
519, the normalized data is set-up for the calculation of an 
inverse bispectrum. The inverse bi spectrum is computed in 
step 521 and the Output of the inverse bi spectrum occurs in 
step 523. 

FIG. 5B is an exemplary software flow chart representing 
the repartitioned flow diagram for the hardware-optimized 
code structure of embodiments of the invention. In particular, 
Step 530 consolidates the various set-up operations of the 
original Speckle algorithm (e.g., steps 505, 515 and 519, as 
discussed above) into a single block. This code structure 
enables embodiments of the invention to implement a more 
efficient hardware device architecture for replication of 
devices and increased processing speed for the algorithm. In 
addition, the computations for the inverse bispectrum are 
made more efficient in a similar way to that discussed above 
for the forward bispectrum. That is, several of the inner loop 
operations are put into hardware so that they may execute in 
a single cycle. 

In step 531 of FIG. 5B, the frame counter is set to 0. A 
frame of a time sequenced image is read into the algorithm in 
step 533. In step 537, the bispectrum of the frame is com- 
puted. Step 539 involves adding the present computation 
bispectrum to previously accumulated bispectrum computa- 
tions. The frame counter is incremented in step 541 . Checking 
whether the frame counter equals the number of frames to be 
processed occurs in step 543. When the frame counter is less 
than the number of frames to be processed, steps 533 to 543 
are repeated. 

Alternatively, as shown in FIG. 5B, when the frame counter 
equals the number of frames to be processed, Step 547 com- 
putes the normalization of the accumulated bispectrum data. 
The inverse bispectrum is computed in step 551 and output of 
the inverse bispectrum occurs in step 553. 

The Sliding Window aspect of embodiments of the inven- 
tion is further discussed in the following paragraphs. In par- 
ticular, the second aspect of the repartitioned Speckle algo- 
rithm was the creation of a sliding window for bispectrum 
storage. FIG. 5C is an exemplary flow chart of the reparti- 
tioned Speckle Algorithm after adding a Sliding Window in a 
hardware implementation. In step 561 of FIG. 5C, the frame 
counter is set to 0. A frame of a time sequenced image is read 
in step 563. In step 567, the bispectrum of the frame is com- 
puted. Determining whether the computed bispectrum buffer 
is lull with 30 accumulated bispectrum computations occurs 
in step 5 87 A. 

When the buffer is NOT full in step 587A, in step 591, the 
next oldest becomes the oldest computed bi spectrum frame 
and step 569 adds the next computed bispectrum to the pre- 
vious accumulated bispectrum computations and then pro- 
ceeds to step 587B to determine whether the buffer is now 
full. In step 587B, when the buffer is not full, the frame 
counter is incremented in step 571 A and the method returns to 
step 563. 

Alternatively, when the buffer is full in step 587A, step 589 
subtracts the oldest computed bispectrum frame of the accu- 
mulated bispectrum computations from the previous 30 accu- 
mulated bispectrum computations to produce 29 previous 
accumulated bispectrum computations. Step 569 adds the 
next computed bispectrum to the previous 29 accumulated 
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bispectrum computations to provide the next 30 accumulated 
bispectrum computations and, in step 591, the next oldest 
becomes the oldest computed bispectrum frame. The flow 
then proceeds to step 587B to determine whether the buffer is 
now full (i.e., 30 accumulated bispectrum computations). 

In step 587B, when the buffer is full, the 30 accumulated 
bispectrums are normalized in step 577. Next, the bispectrum 
is inverted in step 581 . Step 583 outputs the inverted spectrum 
and step 571 B increments the frame counter and the method 
returns to step 563. 

FIG. 6A shows the computational flow of the original 
Speckle algorithm of the background art. As shown in FIG. 
6A, the bispectrum frame computation 701 is performed 30 
times for 30 consecutive frames in order to compute/process 
a first single corrected frame 703. This process of producing 
bispectrum frame computations 701 is repeated again and 
again for a next set of 30 consecutive frames before a next 
single corrected frame 703 is computed. For example, in the 
original Speckle algorithm, 1800 bispectrum frame compu- 
tations were required to generate 1 second of enhanced video 
(i.e., 30 bispectrums/framex60 frames/secxl sec). 

FIG. 6B is a computational flow diagram of the improved, 
repartitioned Speckle algorithm implemented by the embodi- 
ments of the invention. As shown in FIG. 6B, a compute (and 
store) bispectrum computation 702 is performed for an initial 
30 bispectrums. Then a single corrected frame 703 is com- 
puted. However, as shown in FIG. 6B, in the repartitioned 
Speckle algorithm of embodiments of the invention, only 1 
additional compute (and store) bispectrum operation 702 pro- 
vides a next single corrected frame 703 after the compute (and 
store) bispectrum operation 703 of the initial 30 compute 
(and) store bispectrum operations 702. Thus, as can be seen 
from FIG. 6B, embodiments of the invention achieve a speed- 
up, in terms of computing the corrected frame 703 in com- 
parison with the background art Speckle algorithm, by storing 
the previous 30 bi spectrum computations in memory. There- 
fore, for each new frame, embodiments of the invention com- 
pute and store a next bispectrum 702 and use the previously 
29 computed bispectrums. In tills manner, a single next 
bispectrum 702 is computed for each single corrected frame 
703 and a speed-up in producing the corrected frames 703 is 
provided by embodiments of the invention. 

This speed-up in the production of single corrected frames 
703 also comes with a large-scale reduction in bispectrum 
computations per single corrected frame that dramatically 
reduces the overall computations and increases the perfor- 
mance of the repartitioned Speckle algorithm in comparison 
to the original Speckle algorithm of the background art. This 
will be further demonstrated through the performance testing 
of a prototype system discussed below and by considering our 
previous example. That is, for embodiments of the invention, 
the computational rate for real-time processing of 1 second of 
video uses 60 bispectrum transformations (i.e., 1 bispectrum/ 
framex60 frames/secxl sec) and only 60 bispectrum compu- 
tations are utilized by the repartitioned Speckle algorithm as 
opposed to 1 800 bispectrum computations (i.e., a factor of 30 
speed up) for the original Speckle algorithm of the back- 
ground art. 

As discussed above, such a computational rate would not 
be possible on a standard microprocessor-based system 
because of the additional memory required for such an 
approach would be prohibitive in standard microprocessor 
architectures and configurations. In addition, such an optimi- 
zation of the original Speckle algorithm is non-intuitive, non- 
obvious and an unexpected result since the architecture and 
configuration of this new design for the memory architecture 
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would be incompatible with the memory architecture of stan- 
dard microprocessor-based systems and PCs. 

FIG. 7A and FIG. 7B are an exemplary circuit board layout 
and a detailed block diagram of a hardware system for the 
5 accelerated computation of the repartitioned Speckle algo- 
rithm, respectively. FIG. 7A shows an overview of an 
example of a circuit board layout 7002, that may be used in 
conjunction with a host PC 7001. Circuit board 7002 may 
contain a PCI bridge 7003, oscillator 7004, field-program- 
10 mable gate array (FPGA) 7005, TX synthesizer 7006, trans- 
mitter 7007, and receiver 7008. In further detail, as shown in 
FIG. 7B, input/output video data to/from the system is pro- 
vided through coaxial or other shielded connectors 7104, 
1 5 which may be part of a board 7103, which may further be part 
of another board 7102, which may be located, e.g., on a PMC 
carrier 7101. These inputs 7104 are connected to equalizer 
7105 and cable driver 7107, respectively. The output of the 
equalizer 7105 and cable driver 7107 are connected to dese- 
20 rializer 7106 and serializer 7108, respectively. Outputs form 
the deserializer 7106 and serializer 7108 provide input sig- 
nals for the framing receiver 7111 and framing transmitter 

7113, respectively; these, and other components may be 
implemented on an FPGA 7110. The FPGA 7110 may be 

25 coupled to board 7103 via an XRM connector 7109, for 
example, and it may also be coupled to an SDRAM 7127. 
Outputs from the framing receiver 7111 and framing trans- 
mitter 7113 provide inputs to the decode 7112 and encoder 

7114, respectively. A startup function 7125 and Parameter 
30 register file 7126 control the operation of a Speckle Engine 

hardware function 7115. 

The Speckle Engine hardware function further comprises 
an extract tile function 7116 that is connected to a Demean 
3 5 function 7117 that provides inputs to an Apodization window 
7118. The outputs of the Apodization window 7118 are con- 
nected to a two-dimensional (2-D) Real-Complex FFT 7119. 
The properties of the FFT 7119 include but are not limited to: 
real-number inputs in the range of 0 to 1 (inclusive); 2-D FFT 
40 sizes of at least 64x64, 256x256, 512x512 and 1024x1024; 
and wherein the inverse transforms take similar sizes but are 
range constrained at the output. 

In addition, the outputs of the two-dimensional Real-Com- 
plex FFT 7119 provide inputs to an Intensity function 7120, 
45 whose outputs provide the data for a bi spectrum computation 
function 7121. The outputs of the bispectrum computation 
function 7121 provide the inputs to an Averaging unit 7122. 
The outputs of the Averaging unit 7122 are connected to a 
two-dimensional Complex-Real FFT 7123. The outputs of 
50 the two-dimensional Real-Complex FFT 7123 are connected 
to an Apodization Gain unit 7124. The apparatus may further 
incorporate a PLX 7128 that may be coupled between the 
FPGA 7110 and a host PC (not shown). 

Details of an exemplary implementation of the software 
55 and hardware systems discussed above and shown in the 
figures above and test results of the embodiments of the 
invention are discussed in the following paragraphs. In order 
to demonstrate and test embodiments of the invention, a 
physical framework capable of capturing a variety of video 
60 inputs and processing them using an FPGA was assembled. 
As shown in FIG. 8A, the system consists of a PC workstation 
801 fitted with a 16 GB Celerity™ card FPGA 806 and 
advanced capture and display devices, such as the XenaLH 
high-definition capture card 805 from AJA system. In devel- 
65 oping the prototype FPGA solver, we needed to build the 
computational components, state machines, and control logic 
to handle baseline processing functionality. The communica- 
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tion infrastructure required the appropriate memory control- 
lers, host/solver communication channels, and access to the 
necessary I/O device. The prototype system of FIG. 8 A was 
designed to meet, but is not limited to, the latest digital video 
standards used in high definition TV (HDTV): 720p resolu- 5 
tion (1280x720 @ 60 frames per second). This imposes 
highly demanding constraints on the processing, much larger 
than any known background art speckle implementation, 
which typically use, but are not limited to sub-megapixel 
images (512x512 @24 or 30 frames per second) and cannot 10 
achieve realt-time throughput. Accordingly, the prototype 
system also included AMD Opteron™ components 802 and 
803, as well as a video card 804 (in this example, an Nvidia™ 
7800 GTX video card). 15 

An overview of the parameters of an exemplary HD Card 
are shown in FIG. 8B. The High Definition (HD) Video Cap- 
ture Card, as shown in FIG. 8A, supporting a wide range of 
video formats, provides versatility and will lead to further 
commercial applications of this project, and in other areas 20 
beyond speckle image processing. HD-SDI signals were pro- 
vided as inputs into the prototype accelerated Speckle solver 
test system. After being captured by the HD card, they were 
passed to the host PC and Celerity™ FPGA board for Speckle 
algorithm processing. The “before” and “after” results are 25 
output on two monitors. The purely software approach 
required almost 35 seconds to generate a single frame of size 
512 by 512 pixels. However, by utilizing embodiments of the 
invention, as described above, our hardware/software co-pro- 
cessor required less than 1 second. That is, as shown in FIG. 30 
9, the demonstration and test hardware/software co-process- 
ing solution outperformed the purely software approach by a 
factor 40x. Further enhancement of the hardware solution is 
possible through additional, parallel computational pipelines 35 
to achieve an even greater speedup and thus enable real-time 
image enhancement. 

Note that the exemplary prototype system described above 
represents just one, non-limiting example of many possible 
implementations and embodiments of the invention. For 40 
example, the system described above was implemented in a 
host PC. However, it is also possible to implement such accel- 
erated processing within an embedded platform, consisting of 
an FPGA but no host PC. Additionally, the accelerated pro- 
cessing can be performed within a graphics processing unit 45 
(GPU), rather than an FPGA. Furthermore, other hardware 
processing platforms, such as the Cell processor, could utilize 
the invention described above to greatly enhance the perfor- 
mance of the speckle algorithm 

The foregoing description illustrates and describes 50 
embodiments of the invention. Additionally, the disclosure 
shows and describes only the preferred embodiments of the 
invention, but as mentioned above, it is to be understood that 
the invention is capable of use in various other combinations, 
modifications, and environments and is capable of changes or 55 
modifications within the scope of the inventive concept as 
expressed herein, commensurate with the above teachings 
and/or skill or knowledge of the relevant art. The embodi- 
ments described hereinabove are further intended to explain 
best modes known of practicing the invention and to enable 60 
others skilled in the art to utilize the invention in such or other 
embodiments and with the various modifications required by 
the particular applications or uses of the invention. Accord- 
ingly, the description is not intended to limit the invention to 
the form or application disclosed herein. Also, it is intended 65 
that the appended claims be construed to include alternative 
embodiments. 


What is claimed is: 

1. A method for real-time computation of a Speckle algo- 
rithm incorporating a sliding window, comprising: 

initializing and setting up a plurality of memory locations; 
setting a frame counter to zero; 
inputting a present frame of a time sequenced image; 
computing a bi spectrum of the present, frame; 
determining whether a computed bi spectrum buffer is full 
with accumulated bispectrum computations and per- 
forming the following operations: 
when the computed bispectrum buffer is not full, 
setting a next oldest computed bispectrum frame to be 
an oldest computed bispectrum frame, 
adding the computed bi spectrum of the present frame 
to the previous accumulated bispectrum frame 
computations and, 

if the computed bispectrum buffer is still not full, 
incrementing the frame counter and return to the step of 
inputting the present frame; 
and when the computed bi spectrum buffer is full, 
subtracting the oldest computed bispectrum frame 
from the accumulated bispectrum frame computa- 
tions, 

setting a next oldest computed bispectrum frame to be 
the oldest computed bispectrum frame, 
adding the computed bi spectrum of the present frame 
to the previous accumulated bispectrum frame 
computations and, 
if the buffer is still not full, 

incrementing the frame counter and return to the step of 
inputting the present frame; 
and when the computed bi spectrum buffer is full, 
normalizing the accumulated computed bispectrums; 
inverting the normalized accumulated bispectrums; 
outputting the normalized accumulated inverted bispec- 
trums; 

incrementing the frame counter; and 

returning to the step of inputting the present frame. 

2. A system for real-time computation of the method as 
recited in claim 1, the system comprising: 

a host computer; 
a PCI bridge; 
an oscillator; 
a transmitter synthesizer; 
a serializer; 
a receiver; and 
an FPGA, 

wherein at least one of the host computer or the FPGA, 
either alone or in combination, is configured for: 
said computing a bispectrum of the present frame; and 
said determining whether a computed bi spectrum buffer is 
full with accumulated bi spectrum computations and per- 
forming the following operations: 
when the computed bispectrum buffer is not full, 
setting a next oldest computed bispectrum frame to be 
an oldest computed bispectrum frame, 
adding the computed bi spectrum of the present frame 
to the previous accumulated bispectrum frame 
computations’ and, 

if the computed bispectrum buffer is still not full, 
incrementing the frame counter and returning to the 
step of inputting the present frame; 
and when the computed bi spectrum buffer is full, 
subtracting the oldest computed bispectrum frame 
from the accumulated bispectrum frame computa- 
tions, 
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setting a next oldest computed bispectrum frame to be 
the oldest computed bi spectrum frame, 
adding the computed bi spectrum of the present frame 
to the previous accumulated bispectrum frame 
computations and, 5 

if the buffer is still not full, incrementing the frame 
counter and returning to the step of inputting the 
present frame; 

and when the computed bi spectrum buffer is full, 
normalizing the accumulated computed bispectrums; 
inverting the normalized accumulated bispectrums; 
outputting the normalized accumulated inverted bispec- 
trums; 

incrementing the frame counter; and 15 

returning to the step of inputting the present frame. 

3. The system of claim 2, further comprising: 

input and output shielded connectors configured to input 
and output video data; 

an equalizer and cable driver connected to the input and 20 
output shielded connectors; and 
a deserializer, 

wherein the deserializer and serializer are connected to 
respective outputs of the equalizer and cable driver. 

4. The system of claim 3, wherein the FPGA further com- 25 
prises: 

a framing receiver and framing transmitter connected to the 
outputs from the deserializer and serializer; and 
a decoder and encoder connected to the respective outputs 
from the framing receiver and framing transmitter; and 30 
an SDRAM and the PCI Bridge connected to the FPGA, 
wherein the deserializer is connected to the framing 
receiver and the serializer is connected to the framing 
transmitter. 

5. The system of claim 4, wherein the FPGA comprises a 35 
Speckle engine and the Speckle engine further comprises: 

a startup function and parameter register file configured to 
control system operation; 

an extract tile function connected to a Demean function and 
configured to provide inputs to an Apodization window; 40 
a first two-dimensional (2-D) Real-Complex Fast Fourier 
Transform (FFT) connected to the outputs of the 
Apodization window; 

an Intensity function connected to the 2-D Real-Complex 
FFT; 45 

a Compute Bi spectrum function connected to outputs of 
the Intensity function; 

an Averaging Unit connected to outputs of the Compute 
Bispectrum function; 

a second two-dimensional (2-D) Real-Complex FFT con- 50 
nected to outputs of the Averaging Unit; and 
an Apodization Gain unit connected to outputs of the sec- 
ond two-dimensional (2-D) Real-Complex FFT, 
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wherein the startup function is connected to the parameter 
register file, and the Speckle engine, the PCI bridge is 
connected to the startup function and the SDRAM is 
connected to the Speckle engine. 

6. The system of claim 5, wherein at least one of the 
2D-FFTs is configured to process real-number inputs in a 
range of at least 0 to 1, inclusive. 

7. The system of claim 5, wherein the FFT sizes are at least 
64x64, 256x256, 512x512 and 1024x1024. 

8. The system of claim 5, wherein inverse FFT sizes are at 
least 64x64, 256x256, 512x512 and 1024x1024 and a range 
is constrained at the output. 

9. A non-transitory machine-readable medium containing 
executable instructions that, when executed by a machine, 
cause the machine to implement a method for real-time com- 
putation of a Speckle algorithm incorporating a sliding win- 
dow, comprising: 

initializing and setting up a plurality of memory locations; 
setting a frame counter to zero; 
inputting a present frame of a time sequenced image; 
computing a bi spectrum of the present frame; 
determining whether a computed bispectrum buffer is full 
with accumulated bi spectrum computations and per- 
forming the following operations: 
when the computed bispectrum buffer is not full, 
setting a next oldest computed bispectrum frame to be 
an oldest computed bispectrum frame, 
adding the computed bi spectrum of the present frame 
to the previous accumulated bispectrum frame 
computations and, 

if the computed bispectrum buffer is still not full, 
incrementing the frame counter and return to the step of 
inputting the present frame; 
and when the computed bi spectrum buffer is full, 
subtracting the oldest computed bispectrum frame 
from the accumulated bispectrum frame computa- 
tions, 

setting a next oldest computed bispectrum frame to be 
the oldest computed bispectrum frame, 
adding the computed bi spectrum of the present frame 
to the previous accumulated bispectrum frame 
computations and, 
if the buffer is still not hill, 

incrementing the frame counter and return to the step of 
inputting the present frame; 
and when the computed bi spectrum buffer is full, 
normalizing the accumulated computed bispectrums; 
inverting the normalized accumulated bispectrums; 
outputting the normalized accumulated inverted bispec- 
trums; 

incrementing the frame counter and returning to the step of 
inputting the present frame. 





